OPEN 3 ACCESS Freely available online 



•0PLOS I ONE 



Scaling in Transportation Networks 

Remi Louf\ Camille Roth^ ^ Marc Barthelemy^'^^ 

1 Institut de Physique Theorique, CEA-CNRS (URA 2306), Gif-sur-Yvette, France, 2 Centre Marc Bloch Berlin (An-lnstitut der Humboldt Universitat, UMIFRE CNRS-MAE), 
Berlin, 3 Centre d'Analyse et de Mathematique Sociales, EHESS-CNRS (UMR 8557), Paris, France 



CrossMark 



Abstract 

Subway systems span most large cities, and railway networks most countries in the world. These networks are fundamental 
in the development of countries and their cities, and it is therefore crucial to understand their formation and evolution. 
However, if the topological properties of these networks are fairly well understood, how they relate to population and 
socio-economical properties remains an open question. We propose here a general coarse-grained approach, based on a 
cost-benefit analysis that accounts for the scaling properties of the main quantities characterizing these systems (the 
number of stations, the total length, and the ridership) with the substrate's population, area and wealth. More precisely, we 
show that the length, number of stations and ridership of subways and rail networks can be estimated knowing the area, 
population and wealth of the underlying region. These predictions are in good agreement with data gathered for about 140 
subway systems and more than 50 railway networks in the world. We also show that train networks and subway systems 
can be described within the same framework, but with a fundamental difference: while the interstation distance seems to 
be constant and determined by the typical walking distance for subways, the interstation distance for railways scales with 
the number of stations. 
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Introduction 

Almost 200 subway systems run through the largest agglomer- 
ations in the world and offer an efficient alternative to congested 
road networks in urban areas. Previous studies have explored the 
topological and geometrical static properties of these transit 
systems [1-5], as well as their evolution in time [6-8]. However, 
subways are not mere geometrical structures growing in empty 
space: they are usually embedded in large, highly congested urban 
areas and it seems plausible that some properties of these systems 
find their origin in the interaction with the city they are in. 
Previous studies [9,10] have indeed shown that the growth and 
properties of transportation networks are tightly linked to the 
characteristics of urban environment. Levinson [9] for instance, 
showed that rail development in London followed a logic of both 
'induced supply' and 'induced demand'. In other words, while the 
development of rail systems within cities answers a need for 
transportation between different areas, this development also has 
an impact on the organisation of the city. Therefore, while the 
growth of transportation systems cannot be understood without 
considering the underlying city, the development of the city cannot 
be understood without considering the transportation networks 
that run through it. As a result, the subway system and the city can 
be thought as two systems exhibiting a symbiotic behaviour. 
Understanding this behaviour is crucial if we want to gain deeper 
insights into the growth of cities and how the mobility patterns 
organise themselves in urban environments. 

At a different scale, railway networks answer a need for fast 
transportation between different urban centers, and we therefore 



expect their properties to be linked to the characteristics of the 
underlying country. A model of growth has been recently 
proposed [11], and relates the existence of a given line to the 
economical and geographical features of the environment. An 
interesting question is thus to know whether subways and railway 
networks behave in the same way, but at different scales. In other 
words, we are interested to know whether subways are merely 
scaled down railway networks, or whether they are fundamentally 
different objects, following different growth mechanisms. Also, the 
existence of scaling between the system's output and its size is 
important as it suggests that very general processes are governing 
the growth of these networks [12,13]. 

Although many studies [3,5,14] explore the interplay between 
regional characteristics and the structure of transportation 
networks, a simple picture relating the network's most basic 
quantities and the region's properties is still lacking. In the spirit of 
what has recently been done for cities [13] and for railway 
networks [1 1, 15], we propose here a large-scale framework and try 
to understand how subways and railway networks scale with some 
of the substrates' most basic attributes: population, surface area 
and wealth. As a result, we are able to relate the total ridership, the 
number of stations, the length of the network to socio-economical 
features of the environment. We find that these relations are in 
good agreement with the data gathered for 138 subway systems 
and 58 railway networks accross the world. In particular, we show 
that even if the main mechanisms are the same, the fact that both 
systems operate at different scales is responsible for their different 
behaviors. We believe this should lay the foundations for more 
specific and involved discussions. 
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Results 

Framework 

A transportation network is at least characterized by its total 
number of nodes (which are here train or subway stations), its total 
length, and the total (yearly) ridership. On the other hand, a city 
(or a country in the railway case) is characterized by its area, its 
population and its Gross Domestic Product (GDP). Because 
transportation systems do not grow in empty space, but result from 
multiple interactions with the substrate, an important question is 
how network characteristics and socio-economical indicators relate 
to one another. Naturally, a cost-benefit analysis seems to be the 
appropriate theoretical framework. This approach has been 
developed in the context of the growth of railway networks 
[11,15], and in these studies an iterative growth was considered: at 
each step an edge e is built such that the cost function 

Ze=Be — Ce ( 1 ) 

is maximum. The quantity Bg is the expected benefit and Cg the 
expected cost of edge e. In the following, we consider networks 
after they have been built, and we assume that they are in a 
'steady-state' for which we can write a cost function of the form 

Z=^Ze = B-C (2) 

e 

where B is the total expected benefits and C the total expected 
costs, mainly due to maintenance (in the steady state regime). We 
further assume that, during this steady-state, operating costs are 
balanced by benefits. In other words 

Z^O (3) 



number Ri of people using the station / will be a function of the 
area Ci serviced by this station— the 'coverage' [3]— and of the 
population density p= ^ 'm the city 

Ri = ^iCiP (5) 

where is a random number of order one representing the 
fraction of people that are in the area serviced by the station and 
who use the subway. The main difliculty is in finding the 
expression of the coverage. It depends, a priori, on local 
particularities such as the accessibility of the station, and should 
thus vary from one station to another. We take here a simple 
approach and assume that on average 

Ci^nd^ (6) 

where d{) is the typical size of the attraction basin of a given station. 
If we assume that it is constant, the total ridership can be written as 

R=^Ri~in4pNs (7) 

i 

where ^ = ^ X]/ ^/ order of 1 . 

We gathered the relevant data for 138 metro systems across the 
world (see Materials and Methods), which we cross-verified when 
possible with the data given by network operators. We plot the 
ridership i? as a function of NgP on Fig. 1 (left) and observe that 
the data is consistent with a linear behavior. We measure a slope of 
800 km^/year which gives an estimate for do 

(io-500 m (8) 



Indeed, because lines and stations cost money to be maintained, 
we expect the network to adapt to the way it is being used. 
Therefore we can reasonably expect that at first order the cost of 
operating the system is compensated by the benefits gained from 
its use. In the following we will apply this general framework to 
subway and railway networks in order to determine the behavior 
of various quantities with respect to population and GDP. 

Subways 

In the case of subways, the total benefits in the steady-state are 
simply connected to the total ridership R and the ticket price / 
over a given period of time. The costs, on the other hand, are due 
to the maintenance costs of the lines and stations, so that we can 
write (for a given period of time) 

Zsub = Rf-eLL-esNs (4) 

where L is the total length of the network, the maintenance cost 
of a line per unit of length, Ns the total number of stations and 
the maintenance cost of a station (for a given period time). 

It is usually difficult to estimate the ridership of a system given 
its characteristics and those of the underlying city. Due to the 
importance of such estimates for planning purposes, the problem 
of estimating the number of boardings per station given the 
properties of the area surrounding the stations has been the subject 
of numerous studies [16,17]. Here we are interested in the 
dependence of global, average behavior of the ridership on the 
network and the underlying city. Very generally, we write that the 



We illustrate this result on Fig. 1 (right) by representing each 
subway stations of Paris with a circle of radius 500 m. So far, the 
distance do appears here an intrinsic feature of user's behaviors: it 
is the maximal distance that an individual would walk to go to a 
subway station. 

The average interstation distance l\ is another distance 
characteristic of the subway system. Rigorously, this distance 
depends on the average degree <k> of the network so that 
2L 

i] = ; — . It has however been found [7] that for the 13 

Ns<k> 

largest subway systems in the world, <A:>g[2.1,2.4], so that we 
can reasonably take <k> /2^l and thus 




The interstation distance depends in general on many 
technological and economical parameters, but we expect that for 
a properly designed system it will match human constraints. 
Indeed, if do « ii , the network is not dense enough and in the 
opposite case do»ii, the system is not economically interesting. 
We can thus reasonably expect that the interstation distance 
fluctuates slightly around an average value given by twice the 
typical station attraction distance do 
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Figure 1. (Subway) Relationship between ridership and coverage. (Left) We plot the total yearly ridership as a function of pNs. A linear fit 
on the 138 data points gives 7^^800 pNs (i?^ = 0.76) which leads to a typical effective length of attraction (io~500 m per station. (Right) Map of Paris 
(France) with each subway station represented by a red circle of radius 500 m. 
doi:1 0.1 371 /journal.pone.01 02007.g001 



L 



city whose metro system is very young and still under develop- 
ment. 

(10) As a result of the previous argument, we can express l\ in terms 

of the systems characteristics. Indeed, the total ridership now reads 



It follows from this assumption that the interstation distance is 
constant and independent from the population size. In order to 
test our assumption, we plot on Fig. 2 (left) the total length of 
subway networks as a function of the number of stations. The data 
agrees well with a linear fit L ~ 1 . 1 3 Ns (r^ =0.93). We also plot on 
Fig. 2 (right) the normalized histogram of the inter-station length, 
showing that the interstation distance is indeed narrowly 
distributed around an average value ii ^1.2 km with a variance 
(7?^ 400 m, consistently with the value found above for 
do ^ 500 m. The outliers are San Francisco, whose subway system 
is more of a suburban rail service and Dalian, a very large Chinese 



(11) 



If we assume to be in the steady-state Z^ub * 0, using the results 
from Eqs. (4,1 1), we find that the total length of the network and 
the number of stations are linked at first order in 6j/£/, by 



(12) 
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Figure 2. (Subway) Relation between the length and the number of stations. (Left) Length of 138 subway networks in the world as a 
function of the number of stations. A linear fit gives L~ 1.13 A^^ (7^^ = 0.93) (Right) Empirical distribution of the inter-station length. The average 
interstation distance is found to be ?^1.2 km and the relative standard deviation is approximately 440 m. 
doi:1 0.1 371 /journal.pone.01 02007.g002 
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and that the interstation distance reads 

^1 = ^ + - (13) 

This relation implies that the interstation distance increases with 
the station maintenance cost, and decreases with increasing line 
maintenance costs, density and fare. We thus see that the 
adjustment of l\ to match 2 can be made through the fare 
price (or subsidies by the local authorities or national government). 
At this point, it would be interesting to get reliable data about the 
maintenance costs and fares for subway systems in order to pursue 
in this direction and to test the accuracy of this prediction. 

So far, we have a relation between the total length and the 
number of stations, but we need another equation in order to 
compute their value. Intuitively, it is clear that the number of 
stations — or equivalently the total length - of a subway system is 
an increasing function of the wealth of the city. We assume a 
simple, linear relation of the form 

N, = p- (14) 

where G is the city's Gross Metropolitan Product (GMP), and ^ 
the fraction of the city's wealth invested in public transportation. 
This relation can equivalently be interpreted as the proportional 
relation between the number of station per person and the city's 
development, as measured by its GMP per capita. On Fig. 3 (left) 
we plot the number of stations of different metro systems around 
the world as a function of the Gross Metropolitan Product of the 
corresponding city. A linear fit agrees relatively well with the data 

(i?^ = 0.73, dashed line), and gives ^ 10^^ dollars/station. 

However, the dispersion around the linear average behaviour is 
important: more specific data is needed in order to investigate 
whether differences in the construction costs and investments (or 
the age of the system) can explain the dispersion, or if other 
important parameters need to be taken into account. Incidentally, 



another possibility would be to assume that the size of the system 
depends on the age of the system or the development of the city 
(measured by the GMP per capita). However, in both cases, we 
found poor correlations. At this stage, we thus conclude that the 
number of stations (respectively the density of stations) mostly 
depends on the total GMP (respectively the GMP per capita). 

Finally, we consider the number of different lines with distinct 
tracks. A natural question is how the number of lines Nums scales 
with the number stations 7V^, that is to say whether lines get 
proportionally smaller, larger or the same with the size of the 
whole system. We plot the number of lines as a number of stations 
on Fig. 3 (right) and find that the data agree with a linear 
relationship between both quantities (i?^=0.93). In other words, 
the number of stations per line is distributed around a typical value 
of 19, whatever the size of the system. 

Railway networks 

We first discuss an important difference between railway and 
subway networks. In the subway case, the interstation distance l\ is 
such that it matches human constraints: £i ~2 where is the 
typical distance that one would walk to reach a subway station. For 
the railway network, the logic is however different: while subways 
are built to allow people to move within a dense urban 
environment, the purpose of building a railway is to connect 
different cities in a country. In addition, due to the long distance 
and hence high costs, it seems reasonable to assume that each city 
is connected to its closest neighbouring city. In this respect, the 
railway network appears as a planar graph connecting in an 
economical way, randomly distributed nodes (cities) in the plane. If 
we assume that a country has an area A and Ns train stations, the 
typical distance between nearest stations is 





G (dollars) 



Figure 3. (Subway) Size of the subway system and city's wealth. (Left) We plot the number of stations for the different subway systems in the 
dataset as a function of the Gross Metropolitan Product of the corresponding cities (obtained for 106 subway systems). A linear fit (dashed line) gives 
7V^ = 2.51 10~^^G (7^^ = 0.73). (Subway) Number of lines and number of stations (Right) We plot the number of metro lines Nunes as a function 
of the number of stations 7V^. A linearfit on the 138 data points gives 7V/,„e^^ 0.053 7V^(i?^ = 0.93), or, in other words, metro lines comprise on average 
19 stations. 

doi:1 0.1 371 /journal.pone.01 02007.g003 
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Figure 4. (Train) Total length and number of stations. Total length of the national railway network L rescaled by the typical size of the country 
\fA as a function of the number of stations TV^. The dashed line shows the best power-law fit on the 50 data points with an exponent 
0.50 + 0.08 (7^2 = 0.87). 
doi:1 0.1 371 /journal. pone.01 02007.g004 



The total length L'^Ns^n is then given by 



(16) 



In order to test this relation for different countries, we plot the 

Ns on Fig. 4. A power law fit gives an exponent 
0.50 + 0.08 (i?^ = 0.87), which is consistent with the previous 
argument. 

At this point, we have a relation between L and 7V^, but we need 
to find expressions for the other quantities. In contrast with 
subway systems, due to distances involved, the ticket price usually 
depends on the distance travelled and we denote hy the ticket 
price per unit distance. The relevant quantity for benefits is 
therefore not the raw number of passengers - as in subways - but 
rather the total distance travelled on the network T. Also, again 
due to the long distances spanned by the network, the costs of 
stations can be neglected as a first approximation, and we get for 
the budget the following expression 



■TfL-eLL 



(17) 



In the steady-state regime Z train ^ 0, or in other words the 
revenue generated by the network use must be of the order of the 
total maintenance costs [11], which leads to 



JL 



(18) 



In addition, if we assume that the order of magnitude of a trip is 
given by In, the total travelled length is simply proportional to the 
ridership T ^i^R leading to 



^ fL 



(19) 



We thus plot the total daily ridership 7? as a function of the total 
number of stations Ns (figure 5), and despite the small number of 
available data points, a linear relationship between these both 



PLOS ONE I www.plosone.org 



5 



July 2014 I Volume 9 | Issue 7 | e102007 



Scaling in Transportation Networks 



Train 



10' 



Figure 5. (Train) Ridership and number of stations. The total yearly ridership R of the railway networks as a function of the number of stations. 
A linear fit on the 47 data points gives 7^-7.010^ TV, (7^^ = 0.86) 
doi:l 0.1 371/journal.pone.Ol 02007.g005 



quantities seems to agree with empirical data on average 
(i?^ = 0.86). This result should be taken with caution, however, 
due to the important dispersion that is observed around the 
average behaviour, and the small number of observations. 

According to the previous result, the total length and the 
number of stations are related to each other. We now would like to 
understand what property of the underlying country determines 
the total length of the network. That is to say, why networks are 
longer in some countries than in others. As in subway systems, 
economical reasons seem appealing. Indeed, the railway networks 
of some large african countries such as Nigeria are way smaller 
than that of countries such as France or the UK of similar surface 
areas. A priori, when estimating the cost of a railway network, one 
should take into account both the costs of building lines and the 
stations. However, as stated above, considering the distances 
involved, the cost of building a station is negligible compared to 
that of building the actual lines. We thus can reasonably expect to 
have 



(20) 



where G is here the country's Gross Domestic Product (GDP) used 
as an indicator of the country's wealth, and a < 1 the ratio of the 
GDP invested in railway transportation. We plot L as a function of 



G on Fig. 6 and the data agree well {R^ = 0.9\) with a linear 
dependence between L and G (note that we have more points here 
due to the fact that the data about the total length of a railway 
network is easier to get). Again, the dispersion indicates that the 
linear trend should only be understood as an average behaviour 
and that local particularities can have a strong impact on the 
important deviations observed. For instance, the United Arab 
Emirates are far from the average behaviour, with a 52 km 
network and a GDP of roughly 3x10^ million dollars. Yet, the 
construction of a 1 ,200 km railway network has been decided in 
2010, which would bring the country closer to the average 
behaviour. As in the case of subways, we also tried to see whether 
L could better be explained by the development of the country, as 
measured by its GDP per capita, but we didn't find any significant 
correlations. 

Discussion 

We observed scaling relations for global properties of railways 
and subways and the existence of such relations suggests that basic, 
common mechanisms are at play during their evolution. A 
probable reason for the presence of these systems is the mobility 
demand and their structure is driven by economical mechanisms 
that seem to be the same for all countries, independently from any 
cultural, or historical considerations. The fact that macroscopic 
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Figure 6. (Train) Total length of the network and wealth. Total length of the railway network L as a function of the country GDP G. The 
dashed line shows the linear fit on the 138 data points which gives e^^/a^ 10^ dollars. km~^ (i?^ = 0.91). 
doi:1 0.1 371 /journal.pone.01 02007.g006 



properties seem to be independent from specific details opens the 
possibility for simple modelling, and in this spirit, we have 
proposed a general framework to connect the properties of railway 
and subway systems (ridership, total length and number of stations) 
to the socio-economic and spatial characteristics (population, area, 
GDP) of the country or city where they are built. Despite their 
simplicity, our arguments agree satisfactorily with data we 
gathered for almost 140 subway systems and 50 railway networks 
accross the world. As a result, and maybe surprisingly, the 
knowledge of simple characteristics of a country or a city are 
enough to give an estimate of the size and use of its transportation 
system. 

It should be noted that the noise associated with the data (and 
sometimes their definition, see Material and Methods) makes it 
difficult to infer behaviours from the empirical analysis alone. 
Therefore, the most appropriate way to proceed, we believe, is to 
make assumptions about the systems and build a model whose 
predictions can then be tested against data. 

This study suggests that the fundamental difference between 
railways and subways comes from the determination of the 
interstation distance. While it is imposed by human constraints in 
the subway case, the railway network has to adapt to the spatial 
distribution of cities in a country. This remark is at the heart of the 



different behaviors observed for railways and subways (see Table 1 
for a summary of these differences). 

The previous arguments are able to explain the average 
behaviour of various quantities. Nevertheless, it would be 
interesting to identify deviations from these behaviours, and see 
as suggested in [3] whether they are correlated with topological 
properties of the system, or other properties of the network and the 
region. We think that the relations presented here provide 
however a simple framework within which local particularities 
can be discussed and understood. We also think that this 
framework could serve as a useful null-model to quantify the 
efficiency of individual transportation networks, and compare 
them to each other. This would however require more specific 
data than those that were available to us. 

While we have focused on an average, static description of 
metro systems, we believe that our study provides a better 
understanding of how these systems interact with the region they 
serve. This new insight is a necessary step towards a model for the 
growth of subway systems that takes the characteristics of the city 
into account. Indeed, although models of network growth exist, 
the length of networks and nodes at a given time is usually imposed 
exogeneously, instead of being linked to the socio-economic 
properties of the substrate. This study provides a simple approach 
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Table 1. Summary of the differences between subways and 
railways. 







Subway 


Train 


L/Ns 


cste. 


{is 
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J". 


Ns 


G 




L 



We summarize the difference of behaviour between subways and railways. The 
scaling of the average interstation length L/Ns of the network with the number 
of stations reveals the different logics behind the growth of these systems. 
Another difference lies in the total ridership R: while it depends also on the 
population density P/A for subways, it only depends on the number of stations 
for train networks. Finally, the size of both types of networks can be 
expressed as a function of the wealth of the region, represented here by the 
GDP G. However, because the interstation length is constant for subways, the 
size can be expressed in terms of the number of stations A^, or the length. In 
the railway networks case, the cost of stations is negligible compared to the 
building cost of lines, and the size is expressed in terms of the total length L. 
doi:1 0.1 371/journal.pone.01 02007.t001 

to these complex problems and could help in building more 
realistic models, with less exogeneous parameters. 

It would also be interesting to gather data about the exact 
structure of all the networks, to study whether there is a 
relationship between their topology (degree distribution, detour 
index, etc.) and properties of the substrate, as was done for the 
road network in [5]. 

Finally, gathering historical data should allow to address the 
problem of the conditions for the appearance of a subway in a city. 
Indeed, we observe empirically that the GDP of the cities that 
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have a subway system is always larger than about 10^^ dollars, a 
fact that calls for a theoretical explanation. 

Materials and Methods 

Data for 138 subways accross the world were mainly collected 
on Wikipedia [18], and cross-referenced with the operators' data 
when possible. The cities' GDP per capita was retrieved for 114 
cities from Brooking's Global MetroMonitor [19]. The choice of 
population and city area was more subtle. Indeed, most subway 
systems span an area greater than the city core, and the relevant 
area therefore lies somewhere between the city core's area and the 
total urbanized area. We chose to use the population and surface 
area data for urbanized areas provided by Demographia [20]. 

While data about ridership, network length were easily 
retrievable for more than 100 countries from the UIC Railisa 
201 1 database [21], data about the number of stations were more 
difficult to find. We had to use various data sources, mainly 
scrapping the operators' ticket booking websites. Data about the 
GDP, population and surface areas of different countries were 
obtained from the World Bank [22], and the United Nations 
Statistics Division [23]. 

All the data used for this study are publicly available in tsv 
format at [24]. 
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